CVC-UAB's Participation in the Flowchart Recognition Task of CLEF-IP 2012

نویسندگان

  • Marçal Rusiñol
  • Lluís-Pere de las Heras
  • Joan Mas Romeu
  • Oriol Ramos Terrades
  • Dimosthenis Karatzas
  • Anjan Dutta
  • Gemma Sánchez
  • Josep Lladós
چکیده

The aim of this document is to describe the methods we used in the flowchart recognition task of the CLEF-IP 2012 track. The flowchart recognition task consisted in interpreting flowchart linedrawing images. The participants are asked to extract as much as structural information in these images as possible and return it in a predefined textual format for further processing for the purpose of patent search. The Document Analysis Group from the Computer Vision Center (CVC-UAB) has been actively working on Graphics Recognition for over a decade. Our main aim in participating in the CLEF-IP flowchart recognition task is to test our graphics recognition architectures on this type of graphics understanding problem. Our recognition system comprises a modular architecture where modules tackle different steps of the flowchart understanding problem. A text/graphic separation technique is applied to separate the textual elements from the graphical ones. An OCR engine is applied on the text layer while on the graphical layer identify with nodes and edges as well as their relationships. We have proposed two different families of node and edge segmentation modules. One dealing with the raw pixel data and another working in the vectorial domain. The locations of nodes identified are fed to the recognizer module which is in charge of categorizing the node’s type. We have proposed two different node descriptors for the recognizer module. The module analyzing the edges is analysing the connections between nodes and categorizes the edge style. Finally, a post-processing module is applied in order to correct some syntactic errors. We have submitted four different runs by combining the two variants of the segmentation module together with the two variants of the recognition module.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Textual Summarisation of Flowcharts in Patent Drawings for CLEF-IP 2012

The CLEF-IP 2012 track included the Flowchart Recognition task, an image-based task where the goal was to process binary images of flowcharts taken from patent drawings to produce summaries containing information about their structure. The textual summaries include information about the flowchart title, the box-node shapes, the connecting edge types, text describing flowchart content and the st...

متن کامل

Visual Structure Analysis of Flow Charts in Patent Images

This report presents the work carried out for the flow chart recognition task in the course of the CLEF-IP 2012 competition. The goal is to obtain structural information of flow charts based on the visual content of the images. To this end, for each flow chart a list of its nodes and their interconnections, i.e. its edges, is extracted and the type of the nodes and edges and attached text is re...

متن کامل

Optical Structure Recognition Application Entry to CLEF-IP 2012

We present our entry to CLEF 2012 Chemical Structure Recognition task. Our submission includes runs for both bounding box extraction and molecule structure recognition tasks using Optical Structure Recognition Application. OSRA is an open source utility to convert images of chemical structures to connection tables into established computerized molecular formats. It has been under constant devel...

متن کامل

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF

In 2012, the University of Hildesheim participated in the CLEF-IP claims-to-passage task. 4 runs were submitted and different approaches tested. The tested approaches included a language independent trigram search approach, one approach formulating a query in the source language only and another approach with querys translated to English, German, French and Spanish. The results were not satisfa...

متن کامل

Innovandio S.A. at CLEF-IP 2013

Patentability and novelty search is an essential part of any patent application. It ensures that the idea that should be patented has not already been registered anywhere else in the world. However, this task is complicated by the large number of documents and the fact that they are written in many different languages. In this paper we survey four approaches that will help to automate the task ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012